Skip to content

Conversation

@timchepeleff
Copy link

@timchepeleff timchepeleff commented Nov 20, 2025

For some services the steady-state read_timeout/write_timeout must stay extremely low (30–60 ms) because higher values would break the business logic and fallback behavior already built around those limits. Unfortunately, AUTH—especially on TLS-backed ElastiCache with IAM authentication—can routinely take 30–200 ms. With today’s broad timeout knobs, the only way to avoid spurious AUTH failures is to relax the read|write_timeouts, which violates the SLA everywhere else.

This change adds an auth_timeout configuration option that applies only to the connection prelude’s AUTH steps (plain AUTH or HELLO). All other commands still use the existing read/write timeouts. That lets us bump the allowance for the handshake without touching the rest of the pipeline.

Tests cover RESP2, RESP3, and mixed prelude cases, and the README now documents the new knob.

@timchepeleff timchepeleff changed the title Example: Add auth timeout Explicit Auth Timeout Parameter Nov 20, 2025
@byroot
Copy link
Member

byroot commented Nov 21, 2025

So I get the need here, and it makes sense.

However rather than add a very ad hoc auth_timeout, I think I'd prefer a way to do this in a more generic way, potentially using midlewares, so that similar use cases can also be handled.

@byroot
Copy link
Member

byroot commented Nov 21, 2025

potentially using midlewares,

I feel like if middlewares received the connection, it would allow to change the timeout for the duration of the prelude, and it would also allow to fix #261.

The problem is that passing a new parameter would break existing middewares :/ But maybe I should bite the bullet, and also pass a Hash as third argument so that it's easier to make it evolve in the future.

It should also be able to check the middleware signature when they are registered, which should allow to fail early and ask users to update the signature, we might even be able to so some shenanigans to make them work in a degraded way.

@byroot
Copy link
Member

byroot commented Nov 21, 2025

Actually, I'm an idiot, client is already available to middlewares.

@byroot
Copy link
Member

byroot commented Nov 21, 2025

Can you try something like:

module RedisAuthTimeoutMiddleware
  def call_pipelined(commands, _config)
    if commands.any? { |c| c.first == "AUTH" }
      read_timeout_was = client.read_timeout
      client.read_timeout = 1
      begin
        super
      ensure
        client.read_timeout = read_timeout_was
      end
    else
      super
    end
  end
end

@timchepeleff
Copy link
Author

From what I can tell, the connection prelude never goes through middlewares. It’s still run directly on the raw connection (lib/redis_client.rb comment “The connection prelude is deliberately not sent to Middlewares”). Because of that, there’s no hook today to raise the timeout only for the AUTH handshake.

I’d happily switch to a middleware-based solution once the prelude is exposed to them; until then the auth_timeout knob is the only way to scope the higher budget without relaxing the global read/write timeouts.

The problem is that passing a new parameter would break existing middewares :/ But maybe I should bite the bullet, and also pass a Hash as third argument so that it's easier to make it evolve in the future.

I handled that by adding the context hash but only passing it when a middleware’s method signature accepts a third argument. Existing two-arg middlewares still get exactly the same call, while new ones can opt into the extra context.

@byroot
Copy link
Member

byroot commented Nov 21, 2025

From what I can tell, the connection prelude never goes through middlewares.

It very much does:

prelude = config.connection_prelude.dup
if id
prelude << ["CLIENT", "SETNAME", id]
end
# The connection prelude is deliberately not sent to Middlewares
if config.sentinel?
prelude << ["ROLE"]
role, = @middlewares.call_pipelined(prelude, config) do
@raw_connection.call_pipelined(prelude, nil).last
end
config.check_role!(role)
else
unless prelude.empty?
@middlewares.call_pipelined(prelude, config) do
@raw_connection.call_pipelined(prelude, nil)
end
end
end

That comment is outdated.

I believe my example middleware would work, but if you have evidence it doesn't, please show me and I'll try to make the necessary change for it to work.

@timchepeleff
Copy link
Author

timchepeleff commented Nov 21, 2025 via email

@timchepeleff
Copy link
Author

timchepeleff commented Nov 21, 2025

Here's what worked! Thanks for the assist. I'll close this ticket:

module RedisAuthTimeoutMiddleware
  def call_pipelined(commands, config, &block)
    if commands.any? { |cmd| cmd&.first == "AUTH" || (cmd&.first == "HELLO" && cmd.include?("AUTH")) }
      Rails.logger.info("[RedisAuthTimeoutMiddleware] bumping read_timeout to 0.5 during AUTH prelude")
      old_timeout = client.read_timeout
      client.read_timeout = 0.5
      begin
        super
      ensure
        client.read_timeout = old_timeout
      end
    else
      super
    end
  end
end

RedisClient.register(RedisAuthTimeoutMiddleware)
require "benchmark"
require "redis_client"

url  = "..."
...

redis_config = RedisClient.config(
  url: url,
  username: opts[:username],
  password: opts[:password],
  ssl: true,
  read_timeout: 0.005,
  write_timeout: 0.005
)

connect_time = Benchmark.realtime do
  client = redis_config.new_client
  client.call("PING")
ensure
  client&.close
end

puts <<~MSG
  IAM token generation: #{(iam_time * 1000).round(2)} ms
  Redis connect + TLS + AUTH: #{(connect_time * 1000).round(2)} ms
MSG

{"time":"2025-11-21T20:43:54.586+00:00","pid":5866,"request_id":"process-0bfe453aac72e7de1f0b7178","message":"[RedisAuthTimeoutMiddleware] bumping read_timeout to 0.5 during AUTH prelude","user_id":"","log_level":"4","dd":{"trace_id":"0","span_id":"0"},"ddsource":["ruby"]}
IAM token generation: 0.45 ms
Redis connect + TLS + AUTH: 10.34 ms

@byroot
Copy link
Member

byroot commented Nov 21, 2025

Perfect!

@byroot byroot closed this Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants